Using linked data to classify web documents

نویسنده

  • Dominic Fripp
چکیده

Purpose – To find a relationship between traditional faceted classification schemes and semantic web document annotators, particularly in the linked data environment. Design/methodology/approach – A consideration of the conceptual ideas behind faceted classification and linked data architecture is made. Analysis on selected web documents is performed using Calais’ Semantic Proxy to support the considerations. Findings – Technical language aside, the principles of both approaches are very similar. Modern classification techniques have the potential to automatically generate metadata to drive more precise information recall by including a semantic layer. Originality – Linked Data has not been explicitly considered in this context before in the published literature.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Models of Text and Link Structure for Hypertext Classification

Most text classification methods treat each document as an independent instance. However, in many text domains, documents are linked and the topics of linked documents are correlated. For example, web pages of related topics are often connected by hyperlinks and scientific papers from related fields are commonly linked by citations. We propose a unified probabilistic model for both the textual ...

متن کامل

Web Document Classification based on Tagged-Region Progressive Analysis

In this paper, we propose an intelligent web document classification method, called TAgged-Region Progressive Analysis (TARPA). Instead of parsing the whole content of the web page while classifying a web document, TARPA parses the document into finer structured Tagged-Regions and extracts fewer and the most important regions to analyze and classify. If the few important tagged regions are not ...

متن کامل

RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features

Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...

متن کامل

Discovering Concealed Semantics in Web Documents Using Fuzzy Clustering By Feature Matrix Methodology

Asthe data grows exponentially explodingon the 'World Wide Web', the orthodox clustering algorithms obligate various challenges to tackle, of which the most often faced challenge is the uncertainty. Web documents have become heterogeneous and very complex. There exist multiple relations between one web document and others in the form of entrenched links. This can be imagined as a one to many (1...

متن کامل

Use of Linked Data principles for semantic management of scanned documents Emprego dos princípios Linked Data para gestão semântica de documentos digitalizados

The study addresses the use of the Semantic Web and Linked Data principles proposed by the World Wide Web Consortium for the development of Web application for semantic management of scanned documents. The main goal is to record scanned documents describing them in a way the machine is able to understand and process them, filtering content and assisting us in searching for such documents when a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Aslib Proceedings

دوره 62  شماره 

صفحات  -

تاریخ انتشار 2010